IMPORTANCE SAMPLING THROUGH DISTANCE MEASURES FOR ESTIMATING 
THE PROBABILITY OF A RARE EVENT! 


A thesis submitted 

in partial fulfilment of the requirements 
for the Degree of 


MASTER OF TECHNOLOGY 


, i i i 


by 


P. GOPATHY 


to the 

DEPARThENT OF ELECTRICAL ENGINEERING 
INDIAN INSTITUTE OF TECHNOLOGY, KANPUR 


JANUARY 1991. 



ACKNOWLEDGEMENT 


I am immensely indebted to Dr. Rakesh Kumar Bansal 
for having taught me the rigorous foundations of probability 
theory. I am also extremely thankful to him for his 
valuable guidance throughout the course of this thesis work. 

Friends Prabhu Hanyem, Kalyan Raman and Haja 
Mohideen helped in giving shape to this thesis and to them I 
remain indebted. Thanks to all my friends who kept me good 
company during my M.Tech. 

My thanks are also due to Mr. L.S. Bajpai for his 
efficient typing and kind cooperation. 


P. Gopabhi 



1 2 APR 1991 


CENTRAL library 

I 1. T„ KA NPUR 

4cc.No. A. 


£G-t'n'^l-M- (;,op- Bmp 



CERTIFICATE 


It la certified that the work contained in the th 
entitled IMPORTANCE SAMPLING THROUGH DISTANCE MEASURES 
ESTIMATING THE PROBABILITY OF A RARE EVENT by P. Gopathy, 
been carried out under niy supervision and that this work 
not been submitted elsewhere for a degree. 


January, 1991. 


Dr. R. K. Bansal 

Department of Electrical Engineerj 
I . I . T . Kanpur 






i 


A BSTRACT 

The problem of reducing the sample size in the 
Monte-Carlo estimation of the probability of a rare event 
through importance sampling, a variance-reduction technique, 
has an optimal solution that is degenerate. Constrained 
optimal solutions have, therefore, been obtained through ad hoc 
approaches in many specific contexts. 

In this thesis, guided by Kobayashi's theorem on the 
simultaneous minimization of all Ali-Silvey distances by the so 
called least favourable pair (in terms of Bayes's risk) in a 
composite binary hypothesis testing problem, constrained 
optimal solutions that minimise the variance of the importance 
sampling estimator are given for some group and exponential 
families. Using a lemma of Huber (1965) for a given pair to be 
least favourable, the asymptotic optimality of the biasing 
distribution obtained by a shift through threshold is 
established for the location family. 

In the second part. Importance Sampling for Hall's 
Minimum Probability Ratio Tests (MPRTs) is studied in the 
spirit of Siegmund's results for Sequential tests. 
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CHAPTER - I 


I- INTR ODUCTION 


The Mofit e-Cat'lo estimation o£ the probability o£ a 
rare event is an often used technique in recent years, to 
characterize system performance in many contexts. Though only 
very few of these situations may claim full Justification for 
the use of the Monte-Carlo method and are the ones where the 
complexity of the problem discourages analytical solutions 
because of operations involving many interacting random factors 
and where the analytical intractability of such problems may 
suggest the use of simulation [22, 23], still, the class of 
problems where Monte-Carlo methods are justifiably used is 
rather large. In communications and signal processing theory, 
the estimation of a false alarm rate [11], probability ofccrare 
event in <a. stochast i c algor 1 thffl[ 3 ] , expectations of functionals 
of Markov chains, especially the probability of overflow in 
queuing networks etc., the estimation of bit error rate in a 
communication system [2, 12-14, 16, IS] and the estimation of 
error probabilities for sequential tests [19, 20] are examples 
of such problems. A typical case of bit error estimation for a 
communication system serves to exemplify the analytical 
intractability of such problems. Direct evaluation of bit 
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error rates for non linear systems perturbed by non-Gaussian 
noise is generally very difficult since probability densities 


on the 

output space may not 

be 

easily 

obtainable 

in 

cloaed 

f orms . 

In such situations. 

one 

resorts 

to developing 

either 

certain 

bounds on the errors 

or 

c ertain 

aaaumpt iona 

that may 


allow the use of a central limit theorem. In the former case, 
the tightness of the bounds and in the latter case, the 
percentage error incurred are not always known and in some 

cases, have been known to grow exponentially with the number of 
observations [Orsak and Aazhang , 1990]. Hence the use of 

Monte-Carlo estimation techniques may be justified in such 

cont exts . 

In all such situations, the quantity to be evaluated, 

usually an error probability, is extremely small, of the order 

of 10 ^ or less. Given a confidence interval of say, 95%, as 

shown in the next chapter, for an error probability p^ one 

requires approximately lo/p^ simulations. Hence, for p^ < 

10 an inordinately large number of simulations, of the order 
7 

of 10 are required to attain the given accuracy. Apart from 

taxing the system used for simulation, this number can very 

well exceed the period of the random number generator used. 
Importance Sampling is a variance reduction technique that 
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aims at concentrating the simulation effort towards drawing 
the majority of samples from that region in the observation 
space which contributes most to the true value of the estimate. 
The principle involved here is to distort or modify the input 
random process in order to make the original low probability 
events, which influence the value of the estimate moat, to 
occur more frequently. This action is then compensated by 
weighting the events appropriately and thus removing the bias 
introduced earlier. Kahn (1953) first studied this technique in 
a general framework. In communication and signal processing 
theory, this technique was first introduced by Shanmugam and 
Balaban (1930) and has been widely studied since [2, 3, 4, 
11-18, 23] in different contexts. 

Since Importance Sampling principally aims at drawing 
more samples from the” Important ” region, the original input 
probability density is substituted with a biasing density which 
assigns more weight to this important region and samples are 
now drawn from this density. Ideally, one would like to put 
zero mass over the complement ("unimportant”) region and 
normalise the given density. But the normalising constant will 
then be exactly equal to the quantity to be evaluated! This 
intuitive picture tells us that the optimal biasing density 



will be degenerate; it puts the problem of identifying the best 
biasing density in a typical catch-22 situation. 

In all the previous works, some parameteric family 
appropriately chosen for the specific problem at hand was used 
as the constraint family of interest from which the best 
biasing member was indentified by minimising the upper bound on 
the variance of the estimator. Other ad hoc approaches like 
proving that a certain choice of the biasing density gives a 
large reduction in the variance have also been employed. In 
Orsak and Aazhang (1990), a slightly different approach has 
been used. By defining Huber's mixture neighbourhoods around 
the optimal density and an arbitrarily chosen nominal density, 
of radii zero and some a > 0 respectively, the member of the 
mixture neighbourhood around the nominal density which forms 
the least favourable pair with the optimal density is chosen as 
the sub-optimal biasing density. This choice is justified by 
Kobayashi’s theorem [Kobayashi, 1967] that the least favourable 
pair in terms of Bayes’s risk minimises all Ali-Silvey distance 
measures and hence also the so called Importance Sampling 
distance which is proportional to the variance of the 
Importance Sampling estimator. But since the least favourable 
densities are only truncated - normalised combinations of the 
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nominal densities, this solution is dependent on the degenerate 
optimal solution and is therefore unimpl ementabl e . 

In this thesis, we apply Kobayashi's theorem from a 
different point of view. In Chapter II, we prove the 
non-existence of the least favourable pair, when one of the two 
neighbourhoods is shrunk to the optimal density and the other 
is a parametric family, through counter-examples. In Chapter 
III, with sui tabl e Just i f icat 1 on , we use the Kullbaek-Leibler 
distance measure instead of the Importance Sampling distance 
and come up with impl ementabl e solutions for some group 
families like the location, the scale, the location-scale 
families etc. and also the singl e-paramet er exponent ial family. 
Further, all previous works have shown, either through 
brute-force simulation or by other empirical methods that the 
beat biasing member within the shift family is the one with a 
shift that isequal to the threshold itself. Using Huber’s (1965) 
lemma for a given pair to be least favourable, we have 
rigorously established the asymptotic optimality of the shift 
through threshold. 

In Chapter IV, we consider Halls' Minimum Probability 
Ratio Tests, which is a very general class of sequential tests, 
including Uald’s SPRT, t-tests, robust tests etc. as special 
cases, for the application of Importance Sampling. Siegmund’s 
results for sequential test for a pair of simple hypothesis 
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indicate the use of the alternative as the biasing density for 
simulation under the null hypothesis as sub-optimal [Siegmund, 
1976]. Ue therefore, teat the same for Hall's MPRTs and show 
that the use of the alternative gives infinite reduction of the 
sample-size, asymptotically, over thecrude estimator. In the 
last chapter, we indicate the possible lines along which this 
result can be generalised for the multiple hypothesis case. 
Some improvements and modifications of the results by Orsak and 
Aazhang [1990] through the choice of some topological 
neighbourhoods e.g. total variational neighbourhood are also 
discussed. 



CHAPTER - II 


II. PR OBLEM FORMULATION 

In thia chapter, we introduce the concept o£ 
importance sampling and outline the problem o£ identifying the 
beat biasing member from a given claaa. 

2.1 I MPORTANCE SAMPLING DISTANCE : 

Let g be a real valued, Borel measurable, non 
decreasing function defined on real line. Let X be the 'input' 
random variable and Y = g(X) the 'output' random variable and 
the probability of a tail event of interest be expressed as 

00 

Pg = J PyCy) dy 

where "the indicator function of the event oo) . Ue 

can rewrite the expression for the error probability in terms 
of P (x) as follows : 

00 00 

Pg = J Ij^CgCx)) ~ J (2-1) 

-ctl -00 

where ICx) now is the indicator- function of another tail set 
[T, oo) . Here we have made use of the monotonicity of gC-)- 
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Suppose P in (2.1) is to be estimated through a 
crude Monte-Carlo method by the sample mean 


= 


1 

N 


N 

I 

i = l 


iCX.) 


where are iid random variables with density pjj(x) . The 
variance of this estimator is 


VarCp^) 


1 


P,(l - 


Pe) 


( 2 - 2 ) 


If p^ is specified to lie within 10% of p^ with a probability 
of say, 0.95, then the number of simulations required to 
achieve this may be computed using Chebyshev's inequality : 


P 





Var(p^) 

e 

p^/100 

e 


100 


(1-p^) 

e 

N 


Approximating (1-p^) by 1 and the probability of the confidence 


interval also by 1, we see that N ^ 


100 


is required to claim 


this level of accuracy. If p^ is of the order of 10 or less, 
8 

then N S; 10 , which will both tax the system exorbitantly and 

may well exceed the period of the random number generator. 


Kahn [1953] suggests the following alternative 
Rewriting ( 2 . 1 ) as 



S U S €--C k Vi AtvcUo 'VV\«_i&W 


<£) 

' 

-oo 


P^Cx) ^ 

I(x) . -± . dPjjCx) 

PjjCx) 


(2-3) 


where Pj^ is a probability measure such that P^ is ab solut ely 


X 


continuous wrt it and p^j is its density. Ue now call p^^ the 
"biasing” density and form the following estimate for p^ : 


N 


P^CX,) 


= H I KX,, 

i^l Px<^^i^ 


where X, >> P (Hereafter, we uniformly drop the suffix 

1. ^ -A A 

X). The variance of this new estimate is 


00 


J i(x) * — 

-00 vp (x) 


P(x)-j2 ^ 

dp (X) - p: 


Var(p^) 


N 


(2.4) 


If, by a proper choice of p (x) , we can reduce the above 
variance such that 

Var(i*) < Var(p^), 

then we would have achieved a corresponding reduction in the 
number simulations. 

Theorem 1 


I(x) PCX) 


[Orsak and Aazhang, 90] : The choice of p(x) = 

achieves the minimum variance for the importance 


sampling estimator. Let us call this the ’Optimal biasing 


opt 


density' , p 



Proof : 


Applying Jensen's inequality 


to the second 


moment 


in (2.4), ue have 


I(X) 


P(X) 


P*(X) 


PCX) 

I(X) 

P (X) 


= Ep CUX)) = pj 


PCX) 


The equality is achieved iff ICX) . 


= k, a. s . , where 


p*(X) 

is a constant. Integrating either side over C~“» <=^^) . have k 


QED 


This degenerate optimal solution can intuitively be 
explained as follows : Ue will achieve maximum reduction in 
sample-size if we take samples only from the "important" region 
i.e., [T, 05 ) and.no samples at all from (-05, T]. Then, putting 
zero mass over this region or equivalently, truncating pC^:) 
with an indicator and normalizing it, we get the optimal 
biasing density. 


In all previous works, impl ementabl e solutions have 
been derived by choosing some constraint class DP specific to 
the problem at hand and solving for 


= arg jiin Var CP^)- 
p £lP ^ 


In Orsak and Aazhang [ 1990], IP has been chosen as the 
mixture neighbourhood. 
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Let US now rewrite the second moment in (2.4) as 


foil owe : 


OD 


00 


j i(x) r i^dp*(x) = pI [ 

_i L p (X) J ® J 


I(x) pCx) 


-00 V. 


Pg • P (x) 


dP (x) 


I 


P (x) 


dP*Cx) 


(2.5) 


Let us define a distance measure d ^2 ^ between the two 

probability measures P^ and P^ as follows : 


d (P 


1 ' 


^2^ 



( 2 . 6 ) 


Then, d(P^, P^) ia an Ali-Silvey distance measure [Ali and 
Silvey, 1966]. See Appendix A. Using (2.5) and (2.6) one can 
write(2.4)a3 


Var(p*) 


(d(P 


opt 

“TT 


* 

P ) 


1 ) 


A 

Let us call P) = ^^opt' "Importance Sampling 

distance" as in Orsak and Aazhang, (1990) . Using Jensen's 
inequality or the properties of Ali-Silvey distance measures, 

ic t 

we can show that 0 ■£ 1. Again, d^o (P P ) = 1 iPP P 

IS IS opt 

= P , and' then the variance is zero, the minimum possible, 
opt ’ 
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Thus, we reformulate our original problem as follows : solve 
for 


K = 9^^ ^IS 

p £> IF ^ 




where CP is the given contraint class. 


(2.7) 


2.2 kobavashTs theorem : 


Kobayashi's theorem gives a relationship between distance 
measures and the robust detection problem. Ue use this 
relationship to further study the importance sampling problem. 


Let (O, EF ) be a measurable space and and two 
probability measures on it, with densities p^ and p^ 
respectively, wrt some measure /j . Unknow n deviations from the 
nominal pair can be modelled by b loving up into composite 
hypotheses 


IP. = {Q Q = (1*-^.) P. + H., H. W}, i = 0,1 (2.8) 

I 1 1 1 1 1 


where 0 ^ < 1 are the "radii” of the two classes and IH is the 

i 


class of all pr obab i 1 ity measures on (fl , F). Let <j!?(x) be the 

fP% 

conditional probability of acceptance offP^|glv< 
of L^ > 0 is incurred when is falsely rejected, 


^en X . I f a loss 


expected loss or risk under Q' is 


then 


the 
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R(Q: , ^) = I i - c^)| , i = 0,1 


Uhere is the true underlying distribution. The classical 
minixnax, Neyman- Pears on and Bayes’s criteria are now given a 
universal miniinax spirit : 


i) 

minimise 

max 

Sup RCQ', <?!>) 




o 

II 

1 Q1 


ii) 

minimise 

Sup 

R(Q!, 4^) , subject 

to Sup RCOq, i;!') < a 



Qi 


0’ 

o 

iii) 

minimise 

Sup 

{ R(Q^, <P') + 

R(Q’, 



oi,Q; 



where X.’s are the priors. 
1 


For the test of against P^ given by 


<^(X) = 


Ph (x) 

> y 




0 . 


Cx) 


PlCx) 


= r 


< r 


Huber (1965) has shown that there exists a pair of 


distributions Q. jc CP . such that 

1 1 
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R(Q:, 4>) < RCQ., 

and hence the ininimax, Neyman-Pearson and the Bayes’s criteria 
for the test of and are exactly equivalent to conditions 
(i), (ii) and (iii) above, respectively. This 

"least favour able pair" w as also given by Huber 

Cl-^ ) Pi(x)/p (X) < C” 

o O 1 o 

QoCx) = 

d/C”) (1-^^) P^(x), Pj(x)/p^Cx) > C" (2.8) 

Cl-£^) p^Cx), P^(x)/p^(x) < C’ 

q^Cx) = 

C’ (1-^.) P^(x). p.,(x)/p^Cx) > C’ 

1 o 1 o 

where o < C < C''" < cd. 

Ue state another useful result here as a Theorem- 

Theorem S [See Huber (1965) Lemma 2 and the discusion that 
follows] : If for any Q' e IP^ , i = 0,1, and any real 

t, we have 

QIEq./q^ < t] > Q^[q,/q^ < t] > Q. [q./q^ < t] > Q’[q./q^ < t], 

O 1 o oi o lio lo 

then CQ^> is the least favourable pair. 
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Note :(Q^, is referred to as "least favourable in Bayeg’s 

sense” if 

CQ^, Q^) = arg Sup E(Q^, RCQ^, <^)} 

o;. Qi 

Theorem 3 (Q ,Q-) is least: fa^otirable in Bayes's sense for all 
o 1 

sets of priors (X^,X.^) if and oa ly if 

d(0^, Q^) < d(Q;, QJ3 

for all Ali-Silvey distance fli^eastires d . 

For proof of this tlieofem, see Kobayashi (1967). 
Intuitively, a given pair will be least favourable only if they 
are "close" to each other and as far from the nominal measures 
as possible. 


2.3 PROBLEM STATEMENT ; 


In Orsak and Aazhang ( 1?90), the Huber neighbourhood 
(2.7) vas defined aa follows : 



and P , some arbitrary nominal demsity. Then, for some mixture 

o 

or parametric (eg., shift class) class 


the least favourable 



16 


density was written using (2.8). By Kobayashi’s theorem 

(theorem 3, above), this minimises the Importance Sampling 

distance from P i.e., 

opt ' ’ 

= arg min dj^ Q;) 

o o 

Hence is an optimal solution for the Importance Sampling 

problem subject to the constraints used. 

By a different use of Kobayashi's theorem, we now 
show that this is not so. 

Counter-Example? 1 : Consider the doubl e- exponent ial Location 

family 

1 -|x-T’| 

f^,(x) " "2 ® ^ 

Let f^(x) be the density of the null hypothesis and the 
probability of false-alarm be 

OD 

p = rf(x)dx = e**^. 

® Jo 

T 

This is the quanity to be evaluated by simulation (assumed 


unknown). Then, 
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P 


op-t 


( X) 


r-x 

i 


I > r 

Otirher vise 


To choose *1 t su ■b-of>tiLtial d 6 tTis:it 3 ir irom "th Is fa-iiiily, 
let tia compt-ite this .er-lc Inpar-tts-nce Sajoipllng -<11 st''a»ce and 
iftinlmise It 




1 'T’-TP * 

I 2 6 -^ „ T' T 


I T--T ^ ^ 


Tl::iis is iciiniiinii. 3 6<d bjr IT’ = r over T ’ ^ T en*<l by~ T* ’ 
r In 4 roiT T >• TH, EHenc e, t: hea «u.b— opt J.tnal T' Jls g iv en by 


T’ 


T t ■T l»og 


( 2- 9> 


Let us also coaputte- thee Kulllb-ac k;-- Lei bl er- distance, wuli Icrh Ls 
also an Ali-Sil've:y dLiatttaBiice, b et'westn (^) a-nd tHrie ..g&neric 

in emb&r 


"^!CL ’ ^IT 


oc 

f 

J 

“06 


lo*i 


(>c) 




By direct calcrnlatl oau, k»ae see ~th-.at the T’ maiaiim i» ing th±e abov-e 
is 
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T* = T + log 2 (2.10) 

Since the iriinima of these two Ali-Silvey distances occur at two 
different points, by Kobayashi’s theorem, the least favourable 
density does not exist. 

Counter-Example 2 : Again, for the single parameter 

exponential class 

I 

p^(x) = exp (dx - '!'(«)) PjjCx), 

where ’I'(^) is a convex function normalised such that 
we can directly compute the importance sampling 
Kul Iback-Lelbl er distance and minimise them. Here 

00 

^oDt*^^^ = ICx) p (x)/p where Pg = J" 

P [T,co) o e’ „ 


d^,^ is minimised by the & that solves 't’C© ) = , 

KL o o l^op't 


't(b) = 0, 
and the 


dx 


where 


= E (X) 
Popt 


( 2 . 11 ) 


and d,„ is minimised by the 6 that solves 
IS o 
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6 ^o^]/E [e C2.12) 

^opt ^opt 

As we show later, these two are only asymptotically (i.e,, for 

X — > ci::^) equal. Hence, for p >0, the least favourable 

e 

density does not exist. 

Ue observe from the above two cases that though the 

minima occur at different parameter values for the two distance 

measures, for large T or small p , they are pretty close. This 

e 

is obvious from 2.9, 2.10, in counter example 1 but for counter 
example 2 using 2.11 and 2.12, we do this by first 
obtaining 


' ce ) 

. . o 

L X m 

T >a> 


Lim — 
T — >oo 


1 , opt 
”T 


CO 

Jxp^Cx) dx 
T 

Lim ^ 

T >CD 


1 


under some mild conditions on p^Cx) (see Chapter 3) and then. 


Lira 
T >oo 


-9 X -9 X 

E [X e o ]/E [e o""] 

^opt ^opt 


1 


Since the values of the parameters minimising these two 
distance measures are approximately equal for large T, in both 
the above cases, one can consider using the Kul Iback-Leibl er 
distance measure instead of the Importance Sampling distance to 
find auh-optiraal parameter estimates. 
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The above mentioned devious route of minimising the 

Kul Iback-Leibl er distance and assuming that, asymptotically all 

other distance measures are minimised at the same point is 

further Justified by a rigourous proof that we give in section 

3.3 for the existence of the least favourable density for the 

shift class under the condition that p — > 0. 

•*^e 

The difficulty of working with the Importance 
Sampling distance or equivalently, the second moment of the 
Importance Sampling estimate is clearly illustrated by (2.12) 


above . 



CHAPTER - III 


LOCATION. SC ALE. L OCA TION-S C ALE AND SINGLE P ARAMETER 

EXP O NENTIAL FAMILIE S 


In this chapter, we identify the beat biaaing membera 
from the location (ahift), acale, location (ahift) - acale and 
aingle parameter exponential familiea. Ue alao prove the 
aaymptotic optimality for the firat caae. 

3.1 M INIMA OF KULLBACK-LE IBLER DISTANCE ; 


The two counter examples in Chapter II show that 
atleast for the doubl e- exponent ial location family and the 
single-parameter exponential family, for large T (small p^),the 
Importance Sampling distance can be substituted with the 
Kullback-Leibler distance in equation (2.7). The main advantage 
resulting from this is the ease of handling. Further, as we 
shall see shortly, an impl ementabl e and convenient estimate of 
the parameter value that minimises the Kullback-Leibl er 
distance can be obtained. If, for a given parametric family, 


the least favourable member for the corresponding exists, 

then by Kobayashi (1967), we are fully justified in using this 


as the sub-optimal biasing density. Otherwise, with the 


counter examples in mind, we may assume that this is "close" to 


the sub-optimal density for large T and then cross check the 


result by actually computing asymptotically (T — > oo) the ratio 
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of the second moment for this Importance Sampling estimator to 
the second moment of the crude estimator. 


In literature, 
have been studied, under 


so far, the following 

/ 

different contexts : 


three 


f ami 1 i es 


i) Shift 


p(x - T) , T £ (-or, cn) 


ii) Scale 


iii) Till 


^ P ^ -5 CO, ot>) 

e^^ pCx) 

ncxlT’ ^ ^ 


Motivated by this, we consider for the analysis outlined above, 
the location, scale, location-scale families and the 
single-parameter exponential family. First, we consider the 
example of the Gaussian family : 


Exampl e : 


i) Shift 


Let p (x) 
o 


Then, p 

e 


^opt 


Cx) = 



p Cx) dx 

J ° 

T 


I 

P 


. -/Zr, 

e 


e 


, - s2 

(x-a) 
1 


and 


2 


and 


P^Cx) 


1 

i/2tt 


e 
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oo 




f 

J I 


Cx)-. 


-00 


X) 


<a 


dP *Cx) 

opt ^ 


This is minimised for & = u. 

o '^1 , opt 


= J x.p^p^(x) dx. Since 


“OO 


is unknown, p 


is also not known. However, 


1 , opt 


. , , opt - . 

lira — = lira 


00 

J X e 
T 


X 

T 


dx 


= 1 


T — >00 


T — >00 _ X 

T J e 2 dx 

T 


C3.1) 


Alternatively, by using the well-known [Ref. no. 

24, Feller (1^66)] asymptotically tight bound on 


CO 


QCT) = e 




X 

T 


dx 


4 


- e 


T 

2 


QCT) ^ ^ e 


T_ 

2 


we can obtain the corresponding bound on /j. . as 

1 , opt 


T T 


T-1 


, for 


(3.2) 


. . Li, T for large T. In other words, to estimate p , 

_Cx-Tf 

one uses p.j,(x) = e 2 as the biasing density. This 
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result is consistent with Orsak and Aaahang (1989), Uessel , 
Hall and Uise (1988) and Beaulieu (1990). 


ii) Scale 


Here, Pq(x) 


1 

72^ ® 


X 

T 


X 


yln.p 


and p^Cx) = e 2a- 


■^KL ''opf t 4 - 1 > 

c/ 


ca 


and <y = 

o A 


, opt 


X P„pt(x) dx 


minimise the above Kul Iback-Leibl er distance. Again, 


o* 

lim -4=1 


T — >oo T 


Further the exact bounds on {j„ are as follows 

2 , opt 


U 

^2opt 


? T 

1 + T , 1 + 


- 1 


for T > 1 (3.3) 


2 

o ^2,opt ~ 


(3.4) 

2 


X 


IfT CD) 1 2 

ill) Shift and Scale : ^^opt^^^ ~ ~ — ^"p " 72n ® 
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and 


Pa 


’ilncf 




(x-d)^ 

6 iJ^ 

^ 2 , opt ^ ^ 2 , opt 
2 





1 , opt 
2 


o* 


B 

o 


^1 , opt 


and o* 


2 

o 


2 

^ 2 , opt ^1 , opt 


C3.5) 


minimise the above jointly. 

2 

Therefore, B 1 and 0 < o- <2, for large T. This 
o ~ o 

solution is modified in section 3.2 when the actual performance 
of these three estimators are computed. 

In the following three sections, by imposing some 
conditions, we obtain estimates similar to the above for the 
general location and scale families. To that end, first we 
present the following lemaia. 

Lenuna 1 

For a density function f(X) satisfying 
00 

f(x) dx << T f(T) for large T, 

T 

00 

u = X p ^(x) dx T for large T, where 

^l,opt , opt^ 


-00 
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k 


and k 


(O 

I" fCx) dx is the quantity 
T 


to be estimated. 


oo 

J X £(x) dx 
T 

OD 

Tj f(x) dx 
T 

-T £(T) 

= lim = 1 

00 

J £(x) dx - T £CT) 

T 

(by L'Hospital's rule and Leibnitz theorem). 
Remark : 


Proof : 


lim 
T — >00 


l.opt 


1 im 
T — >00 


It is easily veri£ied that 
satisfied for the double exponential 


the above condition 
and Gaussian families. 


is 


3.1.1 Shift family : 


Let P^Cx) = fCx) and pgCx) = fCx-d). 


^opt 


Cx) 




f(3C) 


<D 

J £(x) dx 
T 
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ia the quantity to be eatimated. Let £(x) be auch that 


1) f(-) is differentiable and ia normaliaed by 

= 0 


£’ ( 0 ) 


ii) loCx) 


£’(x) 

Too” 


is convex and monotone decreasing 


over [0 , 00 ) . 


Then d^j (P P^) is minimised by the & that solves 

XvirfOpL^CT O 


OCj 


1 

k J [_ fCx-e ) j 


f(x) dx = 0 


Ue solve for & as follows 
o 


k I [ 7 —^; - )- ] k J 


tc>(x-d^) f(x) dx 


f 


00 


■ ^ ^ JCx-^o) f(x) dx y 


= <p{fj 4.“^}=0 

IjOpt o 

(3.6) 

when & = u, , Use has been made of Jensen’s inequality and 

o '^l.opt 

conditions (i) and (ii) above. Again, 


1 r r ^ 1 


f’(T-6» ) 
o 

£(T-d ) 
o 


= 0 (3.7) 
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when = T, by (i) and (ii) above. 


But for large T. (by lemma 1) 


(3.8) 


Therefore, from 3.6, 3.7 and 3.8, we see that the 
Kullback- Leibler distance for this case is minimised by & 

o 

^ T, for large T. 

Condition (i) and (ii) are satisfied for the 
generalised Gaussian family 


f(X) 


P 

2r (i/p) A(p) 


exp 



Hi 

A(p) 



(3.9) 


where 


2 1/2 

ACp) = rci/p)/ r(3/p)]^^ , for all p > 1. 

See Oraak and Aazhang (1989). 

3.1.2 Scale family s 

Let PqCx) = f(x) and P^(x) = 1 ^C^)- 


Iry _sCx) f(x) 

. . p (x) = — = — - — T , where k = f f(x) dx 

opt k -* 

T 


is the quantity to be evaluated. 
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The Kul Iback-Leibl er distance (P 

KL opt 

minimised by the a which satisfies 

o 


P ) is 
o* 


00 


i J(t) 


f ’C— ) 

<y 


fC^) 

a 

o 


f(x) dx = -1 


Ue impose similar conditions on f here, as in the previous 
cas e : 


i) f(-) is differentiable and is normalised such that f’CO) 
= 0 

. „ f’Cy) : 


iiD i^Cy) 

[ 0 , 00 ). 


fCyD 


is convex and monote decreasing over 


Then , 


00 


- f f — 
^ J I 


f ’ (— ) 


f(^) 

o 


f (x) dx = 


1 

k 


00 




^c^) fCx) dx 
o 




(by Jensen’s inequality) 


If we set the r.h.s. above equal to - 1, then 


- [ f — 1 
k J I -o J 


f'(|-) 

o 

fC^) 

O' 

o 


f ( X ) dx S: - 1 


But for large T in view of lemma 1 
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L 


T 

o 

T 

o 


- 1 - 


(3 . 10) 


Again, 


00 


i I[i;l 


X T 

f’(— ) f’C— ) 

„ <y 

O r/ ^ j - T O 

— J— fCx) dx ^ ^ 


ft—) 

o* 

o 


o f(— ) 
o 


(3.11) 


Froin 3.10 and 3.11 above, we conclude that solving 


L 

a 


T 

f '(— ) 

Cf 

o 

T 

f(— ) 

ct 

o 


asymptotically minimises the Kul Iback-Leibl er distance in this 


cas e . 


3-1.3 Single -parameter exponential family : 

Referring to counter example 2 in Chapter II, we see 
that to estimate 

k = 

T 


P^Cx) dx, 
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^l,opt ~ T. 

The conditions in sections 3.1.1 and 3.1.2 are 

satisfied by the generalised Gaussian family in (3.9), for all 

p > 1. A condition equivalent to 3.1.1 (il), namely that - 

log(f(x)) is convex, has been imposed in Orsak and Aazhang 

(1989), but only a much weaker result, that is, & = T proves 

o 

better than the crude estimator (i.e. & =0) has been obtained 

. o 

(through an ad hoc approach). 

The results obtained for the Gaussian example 
initially are seen to be in agreement with those of sections 
3.1.1, 3.1.2 and 3.1.3. 

3.2 PERRFQRMANCE COMPARISON ; 


In this section, we study the relative performance of 
the different biasing schemes outlined above, namely, shifting, 
scaling, shifting and scaling and also of the crude estimation 


method . 
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G auss i an Examples? s 

Here, the second moment of the crude estimator 


00 2 
X 


^2,MC ~ 


TH J 


dx 


1 e 


T_ 

2 


~ Tlrr T 


(3.12) 


Here we have used the asymptotically tight bound on Q(T) as in 
the previous section. 

n;+h r. r^'i 9 = u T as the biasing density, 

Uith (xj , ^l,opt ~ 

o 


00 


‘'^2,(shift = T) J t J 


- H 




P^Cx) dx 


1 o 


727F - 2T 


(3-13) 


Uith p^ (X) , 
o 


^ as the bising density 


00 


r r ^ 

= 1 I 7-J P 

,(scale=T) j b P^* **o 


7 ^ 


■(T^-l/2) 


(3.14) 
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From 3.12 and 3.13, 


^2 , C3hift=T) 
^2 .MC 

Again, 


^2 , C3hift=T) 
^2 , (seal e=T) 



T >00 

> 0 


- 1/2 

e 


T — >oo 
> 0 


(3.15) 


(3.16) 


From 3.15 and 3.16, we infer that asymptotically shifting 
performs infinitely better than both the scaling and the crude 
methods . 


General case : 


Again, we directly evaluate the ratio of the second 
moments as follows : 


<o 


1 im 
T >(i> /J 


^2 , (shift=T) 


2,nc 


f^(x) 


1 im 
T — ><0 


J f(x-T) 

T 

00 

J f(x) dx 
T 


dx 


lim 
T — >00 


im. = 0 

f(0) 


lim 
T — >«o 


^2 , (3hift=T) 

'2 , (scale=T) 


CD 


I 


f^(x) 

f(x-T) 


dx 


1 im 
T — >«> 


00 


It 


f^(x) 


,x. 


^ ± f(^) dx 


(by L’ Hospital 'a rule and Leibnits theorem). 
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Hence, for the general case, again, shift by T 
perfornia better than both the crude method and the scaling. 

Shifting and scaling : 

In the previous section, for the shifted and scaled 
version of p^(x), we obtained the following solution 


and 0 < < 2 + 

o 

2 

or asymptotically & T , <y £ (0,2). 

o ~ o 

Let us further investigate this biasing technique for the 
Gaussian family. Computing the second , moment of the Importance 
Sampling estimator with p (x) as the biasing density, with 
B T and some arbitrary o* > 0 , we obtain, 


] .^Cx) dx 

o 


(3.17) 


From 3-17 and 3.13, we get 


OD 


r j- 


J L Pe ^^Cx) 


1 "T 
1 e 

^ • V2^ 



35 


^2C3hift and scale) 



^2Cshift) 

Hence for any value of o* << 1, the shifted and scaled version 
perforins better than the shifted version. This is an 

intuitively satisfying result. 

In these two sections, we have managed to put the 
various biasing schemes under an unified framework and improved 
upon the existing results for some parametric classes, 


especially for 

the Gaussian case. 



3.3 ASYMPTOTIC 

OPTIMALITY OF 

SHIFT THROUGH 

THRESHOLD : 

Let us form a Huber 

neighbourhood 

C2.8) 

as follows 

Put = 0 and 

•’l ” Popt 

= {q|q = p(x-e). 

& £■ ( 0 ,cc) } . 

Now, suppose the least favourable pair 

“opt) 

exists. Then 

by Kobayashi’s 

theorem , 




II 

o 

arg min 

q' CP 

o 

'^IS^'^o’ ^opt^ 



But, counter 

examples in 

section 2.3 

have 

shown the 

non-existence 

of this pair 

, at least 

for 

the double 


exponentialand the exponential classes, for finite T. Here, we 
prove that for T ■■ ■ — y <n, this pair does exist for any shi ft 
class, under suitabl econditions . To prove this, we use Huber s 
lemma restated as theorem 2 in section 2.2. Ue rewrite this 


here for convenience 
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If a pair (q^, q^) satisfies, 



( 3 . 18 ) 

for any real t, then it is a least favourable pair. 


Theorem 4r : 

Pi “ <Popt> 

and p^ = {qjq = p (x-a), & £(0,a>)} 

where 

I rr .s') 

^opt^^^ = = I 

T 

then the pair (pCx-T), ^opt^^^^ least favourable 

asymptotically, as T — > oo under the following conditions : 

i) p(-) ia differentiable. 

P(x) 

ii) — ; r-T is monotone decreasing wrt x for a > 0 and is 

pCx-6#) 


bounded above. 
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Proof : 


Ue see that for this class, the right most two 
inequalities in (3.18) are trivially satisifed. Ue now assume 
that for some T = T (T) , p(x - T ) = q . Then, let us 

w O Q Q 

minimise 


1 


I = p(x - fCT)) dx wrt f(T), for all T and t. 


Popt'^^^ 

P(X - T^) 


< t 


Now, given a t, there exists an x = v>(T ,t) such that on A 
' o o 

{x : x> ?>(T^,t)}, 


P - 1 

— < t (by condition ii) 
PCX -T^) 


Then , 


J 


~0D 


QO 

p(x - fCT)) dx + J p(x-fCT) dx 




is to be minimised wrt f(T) for all T. Now, 

■ I’ = p(x-f (T)) [1/f ’ (T) - 1] + p[^-£(T)][l-io’(T^,t)] = 0 

for all T. 


==> f(T) = T + and ^(T^.t) - T + 
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Thus, asymptotically y?(T^,t) is independent of t and Lim f(T)/T 


QED. 

The double-exponential family does not satify the 
above condition (no. ii). Hence, we give a separate proof for 
this here. 


Proposition : 

Theorem 4 holds for the double exponentialf amily, 
without condition (ii). 



First consider the case when & £ (0,T] : 

T-d 

T], t < 2e" 

T-d 

(- 00 , Qo) , t S: 2e 

For t > 2e™~”, the inequality holds trivially. For t < 2e 



we minimise 



I = [ - 
2 


^x-fCT) ^ 

e dx wrt f (T) , -V T, 


~co 


This gives. 


I’ = [1 - f’CT)] = 0, -^T. 


= = > f(T) = T + C and Xim = 1 

T — >«, ^ 


For the next case, i.e, & c (T,oo), 
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P(x - d) 


X < T 


2^T+e 2x^ d > X > T 


2e 


T-d 


X > d 


Hence, we minimise 


[ 1 x-f(T) , 
e ^ ^ dx 


-00 


f(T) 

r 

^ J 

T+T 


oc 


\ ^ r 1 ^fcT)-x 


o _ 1 , t 
2 2 2 


fCT) 


for all T. Here, pCx-T^CT)) is used to denote the member that 
is least favourable with p .^(x). Then 

Wjp L 


T’ 


I ’ = 


1 

_ 1 


r ^ ^ 1 

T-fCT) , 1 

f'CT) f’CT) 

f’CT) 

® 2 

L " J 


^-T+T A . . 

■ ’ I 1" I - 
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f ’(T) = 1, T^CT) =1, ^ T. 


i . e . JLim 
T >00 


£(T) 


T (T) 

JLim 

T — >00 T 


1 


QED. 


The asymptotic optimality of shift through threshold 
has previously been established only throughad hoc approaches 
and brute-force simulation. See orsak and Aashang (1989), 
Uessel et al . (1988), Beaulieu (1990), for example. Here, 
using Huber's lemma and Kobayashi’s theorem, we have proved it 
rigorous 1 y . 



CHAPTER - IV 


SEQ UENTIAL TE5 TR 


Siegjnund (1976 ) has applied importance sampling in 
the Honte Carlo estimation of error probabilities in sequential 
tests . In this chapter , we study a very general class of 
sequential tests called Sequential Minimum Probability Ratio 
Testa introduced by Hall (1980) and guided by Siegmund’s 
results, apply importance sampling to evaluate error 
probabilities in this case. First, we briefly review Hall’s 
paper (1980). 


h all's sequential minim um probability ratio tests CmPRTs;) : 


Let f, (i = 0,1,2) be distinct alternative joint 
in 

densities of data = (X^^, , X^) defined consistently for 

n = 1,2, wrt a common dominating measure fJ . Call f the 

’’intermediary” hypoth esis chosen for the test of f ^ — vSj, £_^ — 

Let d. Ci = 1.2) be the decision that {f. ) is correct. Choose 

1 ^ ^ ^ , i II 

X ^ (i = 1,2) > 0 such that ^ = 1. Define 



A 

= min 
i = l. 2 



f ■ 

in 

f 

on 
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Hall’s nPRT is a random vector (N.D) defined as follows : 

Stoppina rule N : N = inf {n : 1 < a f< 1 1 ) 

n s. j , 

■^1 

Decision rule D : X, f ^ X f (a 

1 In <2 2N 


In the above, (N,D) describes a very wide class of 

sequential tests including Uald's SPRT, Lorden's 2~SPRT, 

Anderson's tests, sequential robust tests etc. as special 

cases. In particular when H , H. and are iid, we have 

o 1 2 

Lorden's tests. The main advantage of the MPRT is that it 
allows control over the weighted average of the two error 
probabilities. This can be proved as follows. 


Let S. be the subset of the range of X. . in which N 
in J 

= n and is chosen. In ~ ^2 ^2n that ^Xn ~ 

If < a f . Hence, 
n on on 


Xi P]_Cd2) 


I Js I.J-' 

" ^2n " ^2n 


= 01 P„ Cd,) 
on o 2 


(4.2A) 


and similarly. 



Adding the two, 
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S Pi " ^2 ^2 ^ « 

SPRT as a Special Case ; 


(4 . 2B) 


‘^on = ‘“l * ''z ^2n’ f-j > » ^"'1 J l^i = 1- L»t 

SPRT (B,A) be the SPRT of vs ®2 ’ deciding in favour of 

when 1 B (> A) , 0 < B < A . U log, let = X and X^ 
= 1 -X. Then, for every X £> C-j^, - 3 ^). there exists (a, X, p) 
for which 

UPRT (a, X, p) = SPRT CB.A). 

Speci f ical ly , 


aCX) 

" a(X) + bCX) 

where 


(4.3a) 


aCX ) 


BCl+A) 


and bCX) 


Cl+B) 


(4.3b) 




1+A 


B 


1 + B 


I - and ct 


(A + (l-A)p^) 


(4.4) 


For proof of the above and further discussions, see Hall 


(1980) . 



4.2 IMPORTANCE SAMPLING FOR MPRJS; 


Ue first review Siegmund’s (1976) results on 
importance sampling for sequential tests. 

-d. G. 1 Slegmund*s results for SPRTs : 


Let X, be iid with a common distribution P such that 
k 

-oo < EX < 0 and P{X >0}>0 (4.5) 

Iv 

n 

Let S = ) X, and for a < 0 < b, let 

n k 

1 

T = inf {n : ^ (a,b) } . 

To estimate a = P{S^ 5: b) using importance sampling through the 
choice of a biasing density from the class 

^ ~ dHCx) (4.6) 

where ^ is normalised so that 4»’C0) = ^^'(0) = 0, Siegmund gives 
the following result : 

Let u be that value of B such that P = P^. By 4.5^ u 
< 0. By the convexity of there exists a 63 > 0 for which 
'■I'(6i) = 'I'(u). Then, as b — > oo, for all & ^ /iJ^Cd') 

converges to 0 at an exponential rate where iJ^(,d') is the second 



noinent of the important sampling estimate with P as the 

& 

:>iasing density. 

In particular, for the case of a simple hypothesis 
against a simple alternative , if is the measure 

corresponding to then, P^ corresponds to . See Siegmund 
(1980) pp 14-19. Again, when x^^’s are logl ikel ihood ratios, 
we have Uald’s SPRT. 

4.2.2 Biasing density for error estimation in MPRT : 


Going back to the notation of 4.1, the probability of 
false alarm in the case of MPRT can be written C4.2A) as 




= f '=‘1 

^2N 


wher e 


dv 


CN) 


dx^ 


(4.7) 


and 


^ 2 M ^^1 

^1 ^IN *‘^1 ’ • • • • 


“ '^"2 ^ 2M ^^1 ^ 


= inf {n : 1 < oi) . 

n 


Recall 


N 



*±o 




f IJJ ..... ,Xjj) dv 


CN) 


(4.8) 


^ qh '^^1 

1 V 


ot 









The unbiased crude Honte Carlo estimate for ot^ when is 

sampled can be written as 


N 


« 


1 = B I ■ ■ [ 


k=l 


k 


> kj^ ; log 


''f2N«k5 

k 


( f (X )>* 
IN, ^ k^ 
k 


where KB) is the Indicator on the Borel set B £ tB^^, the Borel 
field over K , 




k, = log I — ^ I and 

1 ^ L « J 


k^ = log 


" 

1. " 


, and X, = X 


(k) 


Let us consider the following class of densities for 

X, ,, defined analogous to the P_^ class (4.6) : 

(n) <=' 


f ^ .. (x- , . . . ,x ) 

t^ , t2 1 n 


= exp[t, Z^/Xj x^) 


- £^„(x, x^) (4.9) 


where 
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•? TT 2. C.) = log 


f^l 

I J 


and 


* Ct^ , t^) 
n 1 2 


' '’n':‘l''^2’ ' J '*1 ’'n^ 


R 


n 


^ *2 


Cn) 


It is easy to see that this class includes f , 

^ on ' In 


and f^2n ■ ■Specif ically, 


't (0,0) = 'I' (-1, 0) = 'I' (-1, 1) = 0 and 

n n n 


* * 

fr, r,(X. ,..,x ) = f (x. ,..,X ), f . ^(x. ,..X ) 

0,0^ 1’ ’ n on 1 n -1,0 1 n 




Ue also observe that the actual choice of (i = 0,1,2) can 


be arbitrary. 


Theorem 5 ; 

Let (t’, t’) be the second moment of the importance 

Ct X ^ 

sampling estimate of a with f* , as the biasing density. 

1 ’ 2 

Then, ju (-1, 1 )//j (-1,0) converges to zero exponentially as 


-> (Xs 
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Remarks : 


i) The choice of as the biasing density is sampled 

from ^ 2n instead of fj^^) and the consequent use of the estimate 



A _ 

where f,„ = f... (X, ), i = 0, 1, 2, 
iN, iN, ^ k ’ » » » 

k k 

gives an unbiased estimate of at an exponentially faster 

rate as compared to the use of •' 

ii) The justification for choosing as the asymptotic 

parameter and not k^ , is as follows : From 4.1, we see that k^ 

= log is the termination threshold and k^ = log 

is the decision threshold. It is our aim here, as it has been 
in all asymptotic works on importance sampling, to evaluate the 
order of reduction in the sample-size asymptotically as the 
quantity to be evaluated goes to zero. For the MPRT, the 

probability of false alarm decreases as the margin of 

decision k^ increases., Th.erefore, whereas allowing k^ — > oe 
results in oi^ — > 0, k^ — > oo will only affect the value of 


E(Nj . 
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Formally, let 0'(1 ) be the cf— field generated by the 

n 

random variable 


1^ = min CX,Y), where X 



and Y 



Then the decision event, which is an event of the form {X ^ Y} 

or {X > Y} is not measurable wrt O'fl ). 

n 

Proof of Theorem 5 : 


This follows by the direct computation of the moments 

* 

: If f,., ^ , V is the biasing density, then, 

C ti , t2 J 




P 


exp [t^ + t^ Z 2 jjCx^,...Xj^) 


- >!*■ Ct ,t,)] d F (x , . .,x ). 
N 1 Z on 1 n 




.Xjj) > k^} 




{ZijjCXi , . . ,Xjj) > ; Z2jj(Xj^ , . . ,Xj^) > k2] 


(4.10) 


The second moment of the estimate of ot^ as expressed in (4.10) 


above is : 





bO 


exp [C2tj-tp e ( 2t ^-t p , . x„) 

~ ^ 1 ’ ’ ■ ■ ■ ~ ^ 2 ^ 


Then the result directly follows : 


A^2C-1.1) 


= I 


exp [ dFpjj 


^ '■ ^ 2 > 


^ exp C 


-X,) I [- 




' ^ 2 N^ ^ 2 > 


exp (-^2) ^2 • 


QED 


AN EXAMPLE : 

Let us consider the Gaussian case of f^Cx) = N(o,l), 

f^Cx) = NC-pi,l) and f^Cx) = N(+.u,l) where and are 

2 

Zj^Cx) = ^ + i-i-x. and ^^Cx) = 2 /jx 

iCtj.tjJ = ^ [tl * Ctl * 2*2^^ 


iid. Then, 
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. . 'I'CO,0) = 'I'C-l.O) = = 0 


^ exp (“kg) /JgC-l.O) is also verified. 


4. 3. 3 SPRT as a special case of MPRT s 


When, under the conditions listed in (4.3) and (4.4), 
UPRT reduces to SPRT, we show that the result in Theorem 5 
remains consistent. 

Lemma ; Uith = u , f , + u_ f. and conditions (4.3) and 

On 1 In 2 2n 

(4.4), when SPRT (B,A) =I1PRT (o» , X, ju ) , A — > oo i f and kg 

> 00 . 

Proof ; Here, 


‘"l ' '^2 = [ A] 

“ = [A (l-A) 

substituting for from 4.3, we get, 

^ ~ ^ t nr\ " t 4 • Keeping B fixed, 

oi + B(X-l) 


(4.12) 


A — > 00 when X — > 1 and a — > 0. (4.13) 


From (4.12) and (4.13), the result follows. 


QED. 


CENTRAL LIBRARY 

1 L T., KANPUR 
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As — > 00 and — > oo» the UPRT termination and 

decision rules merge into the SPRT termination - decision rule 

for f = jj f + (j f and remain consistent : 
on '^1 In ‘^2 2n 


From (4.3) and (4.4) we have IJ ^ = l-/j where 


B(l+A) 


A 


1+A 


- X 


B(l + ) 




" 

^ - X 

+ (l+B) 

X - ® 

1+A 

l+B 


Then, as A — > oo, /j — > 1 


f merges with . 

on 2n 


Further, for the choice f = jj f , + (l-/u) f„ , the 

on In 2n 

termination rule reduces to 


( ^2n 


' 1 r X 1 


1 'in J 

“ (1-/J) 

[ B 1 J 

“ jU 


and the decsion rule to 


"2n j 

’jr 

X ■] 

L 'in J 


L J 


which for X — > 1 imply that the upper threshold A — > co. 



CHAPTER - V 


CONCLUSION 


For some parametric families, we have shown that the 
solution to the equation 


* 

P 


arg 


min 
P [P 


d(P 


opt 


P) 


is easier to obtain with the Kul Iback-Leibl er distance than 
with the importance sampling distance as a choice for d. Orsak 
and Aazhang (1990) have considered the mixture neighbourhood 
for OP. but the least favourable density is dependant on p 

opt 

A suitable imp 1 ement abl e modification of this density is 

desirable. One can further consider other neighbourhoods like 

the total variational neighbourhood, which are topological 

neighbourhoods and are therefore much fuller and may yield 

better solutions. But, here again, the solutions should be 

modified and made independant of P . since it depends on the 

opt 

(unknown) quantity to be estimated. 

Ue have established the asymptotic optimality of a 
member of the location family thatis shifted by the threshold. 
Similar result for the location-scale family is desirable since 


it performs better than the location family. 
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In case of HPRTs, we have only shown that the yA of 
the density of the alternative hypothesis for biasing gives a 
large reduction in the sample-size. The question of optimality 
remains open. Further, one still needs to consider the 


multiple decision case as discussed in Hall (1980). 
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APPNDIX - A 


Here, we outline the basic properties of Ali-Silvey 
distances. For further details, see Ali and Silvey (1966). 

Let (Q, F) be a measurable space and two 
probability measures on it. Let <ji>, the generalised 
Radon-Nikodym derivative of P 2 with respect to Pj^ exist. If C 
is a Bor el-measurabl e continuous convex function of a real 
variable and f be a Bor el-measurabl e increasing real-valued 
function of a real variable, the coefficient 


dCPi.P2) = f fE {CC^fr)}] 


= f 




CC<?5.) 

CC<P:} dP^ + P^CN) lim — ^ 

— >00 


where N is a P^-null set, is called the Ali-Silvey distance 
measure of divergence of from P^ in the sense that it enjoys 
the following 4 properties : 

i) dCP^,P 2 ) is defined for all pairs of measures P^^ and P^ on 
the same sample space. 

ii) If y = tCx) is a measurable trans f ormat ion from (0,F) onto 
a measure space 0^,®)! then, 
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where 



is 


the induced measure on 'r? . 


iii) dCPj^.P^) is minimum for P^^ = Pg and maximum for P^ j_ P^. 

iv) Let & be a real parameter and let {P^ : B c (a,b)} be a 
family of mutually absolutely continuous distributions on 
the real line such that the family of densities p^(x) wrt 

O 

a fixed measure ju has a monotone libelihood ratio in x. 
Then if & < 9^ ^^2 ^^3 ^ have 




1 



Cl(P^ 





l'''l- { '' C^C--'' I*”" :■ 
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